276
18
Transcriptomics and Proteomics
The closer the partition coefficient is to unity, the “harder” (i.e., the better separated)
the clustering.
Instead of using a clustering approach, the dimensionality of expression space can
be reduced by principal component analysis (PCA), in which the original dataset is
projected onto a small number of orthogonal axes. The original axes are rotated until
there is maximum variation of the points along one direction. This becomes the first
principal component. The second is the axis along which there is maximal residual
variation, and so on (see also Sect. 13.2.2).
Limitations and Alternatives
Microarrays have some limitations, and one should note the following potential
sources of problems: manufacturing reproducibility; variation in how the exper-
iments are carried out [exposure duration (is equilibrium reached?), temperature
gradients, flow conditions, and so on, all of which may severely affect the actual
amounts hybridized]; ambiguity between preprocessed and postprocessed (spliced)
mRNA; mRNA fragment size distribution not matching that of the probes; quan-
titative interpretation of the data; expense. Attempts are being made to introduce
globally uniform standards—minimum information about a microarray experiment
(MIAME)—in order to make a comparison between different experiments possible.
Other techniques have been developed, such as serial analysis of gene expression
(SAGE). In this technique, a short but unique sequence tag is generated from the
mRNA of each gene using PCR (Sect. 17.1.2) and joined together (“concatemer-
ized”). The concatemer is then sequenced. The degree of representation of each tag
in the sequence will be proportional to the degree of gene expression.
The transcription products of many closely related genes such as those originating
from alternative mRNA splicing (Sect. 14.8.5) may be difficult to distinguish using
standard microarray techniques; efforts to overcome that problem include the use of
bundles of tens of thousands of optical fibres, to the ends of which thousands of glass
beads, each loaded with a particular DNA sequence, are fixed. 8 Since the beads are
comparable in size (a few micrometres in diameter) with the optical fibre cores, each
fibre will carry at most one active bead. Each fibre is individually addressable and the
DNA sequence associated with it is first identified using fluorescent complementary
DNA fragments. The attraction of the technique is the enhanced sensitivity.
Problem. How many nn-mers are needed to unambiguously identify gg genes?
8 Yeatley et al. (2002). These researchers combined their fibre optic array with the technique of RNA-
mediated annealing, selection, and ligation (RASL), in which the mRNAs produced in a particular
cell type are extracted and mixed with DNA oligomers whose sequences are complementary to
those at which two RNA sections could be joined by splicing (“splice junctions”); the presence of
a particular splice junction leads to binding of the DNA oligomers, which can then be multiplied,
fluorescently labelled and exposed to the optical fibre array with which the sequences can be
identified.